A Scalable P2P RIA Crawling System with Partial Knowledge
نویسندگان
چکیده
Rich Internet Applications are widely used as they are interactive and user friendly. Automated tools for crawling Rich Internet Applications have become needed for many reasons such as content indexing or testing for correctness and security. Due to the large size of RIAs, distributed crawling has been introduced to reduce the amount of time required for crawling. However, having one controller may result in a performance bottleneck resulting from a single database simultaneously accessed by many crawlers. It may also be vulnerable to complete data loss if a node failure occurs at the storage unit. We present a distributed decentralized scheme for crawling large-scale RIAs capable of partitioning the search space among several controllers in which the information is partially stored, which allows for fault tolerance and for the scalability of the system. Our results are significantly better than for non-distributed crawling, and outperforms the distributed crawling using one coordinator.
منابع مشابه
GDist-RIA Crawler: A Greedy Distributed Crawler for Rich Internet Applications
Crawling web applications is important for indexing, accessibility and security assessment. Crawling traditional web applications is an old problem, for which good and efficient solution are known. Crawling Rich Internet Applications (RIA) quickly and efficiently, however, is an open problem. Technologies such as AJAX and partial Document Object Model (DOM) updates only make the problem of craw...
متن کاملQuery Planning in the PORDaS P2P Database System
Computational science is a rapidly growing multidisciplinary field that has a need for scalable, distributed, and efficient data management. In our research, we see the peer-to-peer (P2P) paradigm a possible solution to some of the problems in distributed data management. P2P has already proved to be suitable in contexts like file sharing, distributed computations, and distributed search. In ou...
متن کاملP2P Network Trust Management Survey
Peer-to-peer applications (P2P) are no longer limited to home users, and start being accepted in academic and corporate environments. While file sharing and instant messaging applications are the most traditional examples, they are no longer the only ones benefiting from the potential advantages of P2P networks. For example, network file storage, data transmission, distributed computing, and co...
متن کاملGGRA: a grouped gossip-based reputation aggregation algorithm
An important issue in P2P networks is the existence of malicious nodes that decreases the performance of such networks. Reputation system in which nodes are ranked based on their behavior, is one of the proposed solutions to detect and isolate malicious (low ranked) nodes. Gossip Trust is an interesting previously proposed algorithm for reputation aggregation in P2P networks based on t...
متن کاملExploiting locality for scalable information retrieval in peer-to-peer networks
An important problem in unstructured peer-to-peer (P2P) networks is the efficient content-based retrieval of documents shared by other peers. However, existing searching mechanisms are not scaling well because they are either based on the idea of flooding the network with queries or because they require some form of global knowledge. We propose the Intelligent Search Mechanism (ISM) which is an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014